# Structured Data Extraction

Visionocr 3B 061125 GGUF
Apache-2.0
A visual OCR model fine-tuned based on Qwen2.5-VL-3B-Instruct, focusing on document-level OCR, long-context visual language understanding, and mathematical LaTeX format conversion.
Image-to-Text Transformers English
V
prithivMLmods
131
1
Qwen2.5 VL 7B Instruct GGUF
Apache-2.0
Qwen2.5-VL is the latest vision-language model from the Qwen family, featuring powerful visual understanding and multimodal processing capabilities, supporting image and video analysis with structured output.
Image-to-Text English
Q
unsloth
8,427
4
Qwen2.5 VL 3B Instruct GGUF
Qwen2.5-VL is the latest vision-language model in the Qwen family, featuring powerful visual understanding and multimodal processing capabilities.
Image-to-Text English
Q
unsloth
4,645
4
Docscopeocr 7B 050425 Exp
Apache-2.0
docscopeOCR-7B-050425-exp is a model fine-tuned based on Qwen/Qwen2.5-VL-7B-Instruct, focusing on document-level OCR, long-context visual language understanding, and accurate image-to-text conversion of mathematical LaTeX formats.
Image-to-Text Transformers Supports Multiple Languages
D
prithivMLmods
531
2
Qwen2.5 VL Instruct 3B Geo
Apache-2.0
Qwen2.5-VL is the latest vision-language model in the Qwen family, focusing on enhanced visual understanding and agent capabilities.
Text-to-Image Transformers English
Q
kxxinDave
29
2
Qwen2.5 VL 72B Instruct AWQ Fix
Other
Qwen2.5-VL is the latest vision-language model in the Qwen family, featuring powerful visual understanding and agent capabilities, supporting multi-format visual localization and structured output generation.
Image-to-Text Transformers English
Q
Benasd
94
1
Qwen2.5 VL 72B Instruct AWQ
Other
Qwen2.5-VL is a multimodal large language model launched by the QwenLM team, featuring powerful visual understanding and intelligent agent capabilities, supporting various input formats including images, videos, and text.
Text-to-Image Transformers English
Q
Benasd
173
6
Qwen2.5 VL 72B Instruct Pointer AWQ
Other
Qwen2.5-VL is the latest vision-language model in the Qwen family, featuring enhanced visual understanding, agent capabilities, and structured output generation.
Image-to-Text Transformers English
Q
PointerHQ
5,592
8
Qwen2.5 VL 7B Instruct AWQ
Apache-2.0
Qwen2.5-VL is a multimodal vision-language model launched by Tongyi Qianwen, featuring powerful image understanding and text generation capabilities.
Image-to-Text Transformers English
Q
Benasd
226
7
Qwen2.5 VL 3B Instruct 4bit
Qwen2.5-VL is the latest vision-language model in the Qwen family, featuring enhanced visual understanding, agent capabilities, and long video processing.
Text-to-Image Transformers English
Q
jarvisvasu
174
3
Fintabqa
MIT
A financial table question answering model based on the LayoutLM architecture, specifically designed to extract and answer structured questions from financial tables.
Question Answering System Transformers English
F
ethanbradley
128
0
Output LayoutLMv3 V7
A document understanding model fine-tuned based on microsoft/layoutlmv3-base, excelling in document layout analysis tasks
Text Recognition Transformers
O
Noureddinesa
18
1
Table Transformer Detection Custom Ale
MIT
A table detection model based on DETR architecture, specifically designed to identify table regions in documents
Text Recognition Transformers
T
aParadigmP
44
0
Trained Model
This model is a fine-tuned version of microsoft/layoutlmv2-base-uncased on the generator dataset, suitable for document understanding and layout analysis tasks.
Large Language Model Transformers
T
vfu
14
0
Donut Receipt V2
MIT
A model fine-tuned based on naver-clova-ix/donut-base, potentially used for receipt recognition or document understanding tasks
Large Language Model Transformers
D
mychen76
31
0
Donut Demo
MIT
CORD-v2 is a model for image-to-text tasks, primarily used for extracting and recognizing text content from images.
Text Recognition Transformers
D
zhongren2
20
0
Model3
MIT
Document image understanding model fine-tuned based on naver-clova-ix/donut-base-finetuned-cord-v2
Image-to-Text Transformers
M
sunilsai
13
0
Donut Base Finetuned Cord V2
Donut is a visual document understanding model based on Swin Transformer, specifically fine-tuned for the CORD dataset, capable of extracting structured text information from images.
Image-to-Text Transformers
D
Xenova
32
0
Table Detection
MIT
A table detection model based on DETR architecture, specifically designed to identify and extract tables from unstructured documents
Object Detection Transformers
T
grays-ai
17
0
Donut Base Sroie
MIT
A model fine-tuned on an image folder dataset based on naver-clova-ix/donut-base, with no specific use case explicitly stated
Text Recognition Transformers
D
iamkhadke
13
0
Thesisdonut
MIT
A model fine-tuned based on naver-clova-ix/donut-base, specific uses and functions require more information
Image-to-Text Transformers
T
Humayoun
13
0
Donut Base Receipt V3
MIT
Receipt recognition model fine-tuned based on naver-clova-ix/donut-base
Large Language Model Transformers
D
hyunguk1
13
0
All Format
MIT
A model fine-tuned based on philschmid/donut-base-sroie, suitable for image processing tasks
Text Recognition Transformers
A
dreeven
17
0
Yolov8n Table Extraction
A table detection model based on YOLOv8, capable of identifying table regions in documents, supporting both bordered and borderless table types.
Object Detection TensorBoard
Y
keremberke
474
13
Donut Base Sroie
MIT
This model is a fine-tuned version of naver-clova-ix/donut-base on an image folder dataset, suitable for document understanding tasks.
Text Recognition Transformers
D
unstructuredio
31
1
Donut Base Sroie
MIT
A document understanding model fine-tuned based on philschmid/donut-base-sroie
Text Recognition Transformers
D
Prem11100
13
0
Donut Base Medical Handwritten Blocks Data Extraction
MIT
A model based on the Donut architecture, specifically designed for extracting structured data from medical handwritten documents
Text Recognition Transformers
D
mjawadazad2321
15
1
DETR Table Detection
Table Transformer is a table detection model based on the DETR architecture, specifically designed to detect and recognize table structures from document images.
Text Recognition Transformers English
D
SalML
17
2
Donut Base Sroie
MIT
A document understanding model fine-tuned from naver-clova-ix/donut-base, suitable for image text extraction tasks
Text Recognition Transformers
D
philschmid
185
3
Layoutlmv3 Finetuned Cord
A document understanding model fine-tuned on the CORD dataset based on LayoutLMv3, excelling in document token classification tasks
Text Recognition Transformers
L
nielsr
617
12
Layoutlmv2 Finetuned Sroie Mod
A document understanding model fine-tuned from microsoft/layoutlmv2-base-uncased, suitable for structured document information extraction tasks
Large Language Model Transformers
L
Theivaprakasham
37
1
Layoutlmv2 Finetuned Sroie
A document information extraction model fine-tuned on the SROIE dataset based on the LayoutLMv2 architecture, excelling at extracting key fields from receipt documents
Sequence Labeling Transformers
L
Theivaprakasham
71
2
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase